A statistical basis for testing the significance of mass spectrometric protein identification results.
نویسندگان
چکیده
A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide masses of the proteins in a genome database. The number of masses that matches is used to compute a score, S, for each protein, and the protein that yields the best score is assumed as the identification result. There is a risk of obtaining a false result, because masses determined by MS are not unique; i.e., each mass in a peptide map can match randomly one or several proteins in a genome database. A false result is obtained when the score, S, due to random matching cannot be discerned from the score due to matching with a real protein in the sample. We therefore introduce the frequency function, f(S), for false (random) identification results as a basis for testing at what significance level, alpha, one can reject a null hypothesis, H0: "the result is false". The significance is tested by comparing an experimental score, S(E), with a critical score, S(C), required for a significant result at the level alpha. If S(E) > or = S(C), H0 is rejected. f(S) and S(C) were obtained by simulations utilizing random tryptic peptide maps generated from a genome database. The critical score, S(C), was studied as a function of the number of masses in the peptide map, the mass accuracy, the degree of incomplete enzymatic cleavage, the protein mass range, and the size of the genome. With S(C) known for a variety of experimental constraints, significance testing can be fully automated and integrated with database searching software used for protein identification.
منابع مشابه
A model of random mass-matching and its use for automated significance testing in mass spectrometric proteome analysis.
A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a...
متن کاملThe statistical significance of protein identification results as a function of the number of protein sequences searched.
The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is fa...
متن کاملAssigning spectrum-specific P-values to protein identifications by mass spectrometry
MOTIVATION Although many methods and statistical approaches have been developed for protein identification by mass spectrometry, the problem of accurate assessment of statistical significance of protein identifications remains an open question. The main issues are as follows: (i) statistical significance of inferring peptide from experimental mass spectra must be platform independent and spectr...
متن کاملProbity: a protein identification algorithm with accurate assignment of the statistical significance of the results.
An algorithm for protein identification based on mass spectrometric proteolytic peptide mapping and genome database searching is presented. The algorithm ranks database proteins based on direct calculation of the probability of random matching and assigns the statistical significance to each result. We investigate the performance of the algorithm by simulation and show that the algorithm respon...
متن کاملProtein identification in complex mixtures.
This paper investigates the prospects of successful mass spectrometric protein identification based on mass data from proteolytic digests of complex protein mixtures. Sets of proteolytic peptide masses representing various numbers of digested proteins in a mixture were generated in silico. In each set, different proteins were selected from a protein sequence collection and for each protein the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Analytical chemistry
دوره 72 5 شماره
صفحات -
تاریخ انتشار 2000